A Super Phonetic System and Multi-dialect Chinese Speech Corpus for Speech Recognition
نویسندگان
چکیده
In this paper, we describe the work on Chinese multi-dialect speech processing. Based on the phonetic analysis of ten Chinese dialects, we have created a Chinese super phonetic system for the Chinese speech recognition. To exam this phonetic system and develop Chinese dialect speech technology, we are building a multi-dialect speech corpus, which includes 10 dialect areas and 2000 speakers.
منابع مشابه
Spoken language resources for Cantonese speech processing
This paper describes the development of CU Corpora, a series of large-scale speech corpora for Cantonese. Cantonese is the most commonly spoken Chinese dialect in Southern China and Hong Kong. CU Corpora are the first of their kind and intended to serve as an important infrastructure for the advancement of speech recognition and synthesis technologies for this widely used Chinese dialect. They ...
متن کاملDialect adaptation for Mandarin Chinese speech recognition
Many local or regional dialects exist in China. In case of mismatch between the dialect used to train the system and the dialect of the user, poor recognition accuracy is obtained. In this paper, we therefore investigate the development of a dialectspecific recognition system in Mandarin Chinese using standard adaptation techniques: a speaker-independent (SI) model trained on a source dialect (...
متن کاملChinese dialect identification using an acoustic-phonotactic model
In this paper we develop hidden Markov model (HMM) based approaches to identify Chinese dialects spoken in Taiwan. This task can be aided by exploiting various characteristic features of Chinese spoken languages. The baseline system performs phonotactic analysis after the speech utterance is tokenized into a sequence of five broad phonetic classes. The sequential statistics of the resulting sym...
متن کاملConstruct a multi-lingual speech corpus in taiwan with extracting phonetically balanced articles
In this paper, we describe an initial stage to construct a multilingual speech corpus in Taiwan with selecting phonetically balanced scripts. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, constructing a multilingual phonetic alphabet, namely For...
متن کاملA set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing “phonetically rich” and “prosodically ric...
متن کامل